Skip to content

Comments

🤖 feat: add deterministic stream guardrails for verification and doom loops#2476

Open
ibetitsmike wants to merge 8 commits intomainfrom
mike/harness-guardrails
Open

🤖 feat: add deterministic stream guardrails for verification and doom loops#2476
ibetitsmike wants to merge 8 commits intomainfrom
mike/harness-guardrails

Conversation

@ibetitsmike
Copy link
Contributor

@ibetitsmike ibetitsmike commented Feb 18, 2026

Summary

Adds two deterministic harness guardrails to the agent loop that enforce better agent behavior via the tool pipeline (not just prompting):

  1. Pre-completion verification guard — gates agent_report to reject completion when the agent edited files but never ran any validation commands (tests, typecheck, lint). Allows through on a second attempt as an escape hatch.
  2. Doom-loop detection — tracks per-file edit counts during a stream and injects a model-only nudge when the same file is edited 7+ times, telling the agent to step back and reconsider its approach.

Implementation

New per-stream tracker classes

  • StreamEditTracker — counts edits per file path, supports one-time nudge per file per stream
  • StreamVerificationTracker — tracks whether any validation-like bash commands were run, with a one-time nudge-then-allow-through lifecycle

Both are instantiated per-stream in aiService.ts and threaded through ToolConfiguration to tool factories.

Verification guard (agent_report)

  • Before returning { success: true }, checks if edits occurred and no validation was attempted
  • First attempt: throws with a clear error instructing the agent to run validation
  • Second attempt: allows through (escape hatch for tasks where validation isn't applicable)
  • Detection uses regex patterns matching common validation commands (make test, bun test, vitest, tsc, run_and_report, etc.)

Doom-loop nudge (file_edit_operation)

  • After each successful file write, records the edit in the tracker
  • At threshold (7 edits to same file), attaches a <notification> via __mux_notifications (model-only, stripped before UI/persistence)
  • Nudges once per file per stream (no spam)
  • Skipped in plan-only mode

Safety

  • All behavior opt-in by tracker presence in ToolConfiguration — IPC tool calls without trackers see zero change
  • Uses existing __mux_notifications infrastructure (already tested for stripping before persistence/UI)
  • Conservative defaults: threshold 7, one nudge, one verification block then escape hatch

Validation

  • All new unit tests pass (StreamEditTracker, StreamVerificationTracker, agent_report, bash, file_edit_operation)
  • make typecheck
  • make lint
  • make fmt-check

Generated with mux • Model: anthropic:claude-opus-4-6 • Thinking: xhigh • Cost: $1.44

@ibetitsmike
Copy link
Contributor Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 29df8afc44

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@ibetitsmike
Copy link
Contributor Author

@codex review

Addressed feedback: tightened run_and_report detection to only match when the wrapped command itself is a validation command. Added negative test case for run_and_report install bun install.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e99336f295

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@ibetitsmike
Copy link
Contributor Author

@codex review

Addressed: new-file creation in file_edit_insert.ts now records edits via editTracker.recordEdit() in the create-file branch, ensuring both guardrails cover file creation workflows.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a503345873

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@ibetitsmike
Copy link
Contributor Author

@codex review

Addressed: validation commands are now recognized after shell operators (&&, ||, ;, |), covering monorepo workflows like cd packages/app && make test and run_and_report unit cd packages/app && bun test.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ece687522d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@ibetitsmike
Copy link
Contributor Author

@codex review

Addressed: removed .*? from run_and_report regex so only the actual command (third word) is checked. Added negative test for run_and_report note echo make test. Chained commands like run_and_report unit cd app && bun test still work because && bun test is caught by the standalone pattern.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e27c3a7853

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@ibetitsmike
Copy link
Contributor Author

@codex review

Addressed with a clarifying code comment. Shell command parsing with regex is inherently imprecise — environment prefixes (env CI=1), shell wrappers (bash -c), etc., create an infinite regression of edge cases that regex alone can't solve. The agent_report escape hatch (second attempt always passes) is the designed safety net for any false negatives, adding at most one harmless retry. The current heuristic correctly handles the common cases (direct commands, &&/; chains, and run_and_report with direct validation commands).

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 16a071c69a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@ibetitsmike
Copy link
Contributor Author

Pausing the review loop here and leaving this for human direction.

Re: bash-based edit tracking — This was deliberately scoped out in the original plan. Tracking file edits via bash would require parsing arbitrary shell commands to figure out which files they write to (redirections, sed, git, formatters, etc.), which is fundamentally unreliable and a much larger, separate problem.

The file_edit_* tools are the primary editing mechanism for agents — the vast majority of code changes go through them. Covering those handles the common case. Bash-based edits are an inherent limitation that's acknowledged and can be addressed as a follow-up if data shows it's needed.

The alternative (heuristically guessing which files bash commands modify) would introduce a high false-positive rate and make the guardrail unreliable in the other direction.

…mand

run_and_report is a generic wrapper, not inherently a validation command.
Only match when the wrapped command itself is a validation command
(e.g., run_and_report typecheck make typecheck).
The create-file branch in file_edit_insert.ts bypassed executeFileEditOperation,
so new files weren't counted by the edit tracker. This meant a stream that only
created files could skip verification.
Match validation commands like 'cd packages/app && make test' and
'source .env; bun test' by accepting shell operators as command prefixes
in addition to line start.
Remove .*? from the run_and_report pattern so that only the third word
(the actual command) is checked against validation patterns. Chained
commands after && are already caught by the standalone pattern.
Shell command parsing with regex is inherently imperfect. The escape hatch
(second agent_report always passes) covers false negatives from env prefixes
and shell wrappers.
@ibetitsmike ibetitsmike force-pushed the mike/harness-guardrails branch from 16a071c to b5757d4 Compare February 18, 2026 08:58
@ibetitsmike
Copy link
Contributor Author

@codex review

1 similar comment
@ibetitsmike
Copy link
Contributor Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b5757d4d7e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@ibetitsmike
Copy link
Contributor Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b5757d4d7e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@ibetitsmike
Copy link
Contributor Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 95d45718c2

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Background bash commands (run_in_background=true) haven't produced
results yet when they start, so they shouldn't count as 'validation
attempted' for the pre-completion verification guard.
@ibetitsmike ibetitsmike force-pushed the mike/harness-guardrails branch from 95d4571 to 07e9e67 Compare February 18, 2026 09:25
@ibetitsmike
Copy link
Contributor Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 07e9e67fcd

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

…n't count

Previously, running make test before editing files would permanently
satisfy the verification guard. Now file edits reset the validation
tracker, ensuring only post-edit validation counts.
@ibetitsmike
Copy link
Contributor Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f8b64a38ff

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@ibetitsmike
Copy link
Contributor Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f8b64a38ff

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@ibetitsmike
Copy link
Contributor Author

Note for reviewers

Codex keeps raising the same P1/P2 about tracking bash-based file edits (redirections, sed -i, git commands, etc.) on every review round — this has been raised and resolved 6+ times across review rounds.

This is an intentional design decision:

  • Reliably detecting file mutations from arbitrary bash scripts is infeasible with static regex
  • The escape hatch (second agent_report call always passes) handles this case
  • The guardrail catches the common case (tool-based file_edit_* edits) deterministically
  • Conservative defaults: one nudge, one block, then allow through

All other Codex feedback was addressed with code changes:

  • ✅ Narrowed run_and_report regex to check actual wrapped command
  • ✅ Tracked new-file creation in file_edit_insert
  • ✅ Recognized validation commands after shell operators (&&, ;, |)
  • ✅ Excluded background commands from verification credit
  • ✅ Reset validation state on file edits (pre-edit validation no longer counts)

CI checks are all passing. Ready for human review.

@ibetitsmike
Copy link
Contributor Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f8b64a38ff

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant